AITopics | barack obama

Collaborating Authors

barack obama

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Applying Relation Extraction and Graph Matching to Answering Multiple Choice Questions

Shimoda, Naoki, Yamamoto, Akihiro

arXiv.org Artificial IntelligenceNov-19-2025

In this research, we combine Transformer-based relation extraction with matching of knowledge graphs (KGs) and apply them to answering multiple-choice questions (MCQs) while maintaining the traceability of the output process. KGs are structured representations of factual knowledge consisting of entities and relations. Due to the high construction cost, they had been regarded as static databases with validated links. However, the recent development of Transformer-based relation extraction (RE) methods has enabled us to generate KGs dynamically by giving them natural language texts, and thereby opened the possibility for representing the meaning of the input sentences with the created KGs. Using this effect, we propose a method that answers MCQs in the "fill-in-the-blank" format, taking care of the point that RE methods generate KGs that represent false information if provided with factually incorrect texts. We measure the truthfulness of each question sentence by (i) converting the sentence into a relational graph using an RE method and (ii) verifying it against factually correct KGs under the closed-world assumption. The experimental results demonstrate that our method correctly answers up to around 70% of the questions, while providing traceability of the procedure. We also highlight that the question category has a vast influence on the accuracy.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.14144

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
North America > United States > Hawaii (0.04)
(5 more...)

Genre: Research Report > New Finding (0.49)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Can Large Language Models Express Uncertainty Like Human?

Tao, Linwei, Yeh, Yi-Fan, Kai, Bo, Dong, Minjing, Huang, Tao, Lamb, Tom A., Yu, Jialin, Torr, Philip H. S., Xu, Chang

arXiv.org Artificial IntelligenceSep-30-2025

Large language models (LLMs) are increasingly used in high-stakes settings, where overconfident responses can mislead users. Reliable confidence estimation has been shown to enhance trust and task accuracy. Y et existing methods face practical barriers: logits are often hidden, multi-sampling is computationally expensive, and verbalized numerical uncertainty (e.g., giving a 0-100 score) deviates from natural communication. We revisit linguistic confidence (LC), where models express uncertainty through hedging language (e.g., probably, might), offering a lightweight and human-centered alternative. To advance this direction, we 1) release the first diverse, large-scale dataset of hedging expressions with human-annotated confidence scores, and 2) propose a lightweight mapper that converts hedges into confidence scores at near-zero cost. Building on these resources, we 3) conduct the first systematic study of LC across modern LLMs and QA benchmarks, revealing that while most LLMs underperform in expressing reliable LC, carefully designed prompting achieves competitive calibration and discriminability. Finally, we 4) introduce a fine-tuning framework that further improves LC reliability. Taken together, our work positions linguistic confidence as a scalable, efficient, and human-aligned approach to LLM uncertainty estimation, and calls for deeper exploration of this promising yet underexplored direction. The code and dataset are anonymously available at https://anonymous. Large language models (LLMs) are increasingly deployed in real-world applications, from education and healthcare to law and scientific discovery. While their capabilities make them powerful assistants, LLMs are also prone to hallucinations and factual errors, and human overreliance on their outputs can lead to serious consequences. For instance, a U.S. lawyer once submitted fabricated cases generated by ChatGPT, resulting in professional sanctions (ABC News, 2023). Recent social experiments demonstrate that people adjust their reliance on AI depending on how confident the model appears: reliable expressions of uncertainty can enhance trust, satisfaction, and task accuracy (Kim et al., 2024; Xu et al., 2025). These findings highlight the importance of associating reliable uncertainty estimates with LLM responses to support human decision-making. Ultimately, the conveyance of confidence plays a central role in shaping trust and guiding human-AI interaction. A growing body of work explores the extraction and representation of confidence in LLM outputs. These methods are simple and inexpensive but require access to model logits, which are typically unavailable in commercial LLM APIs. However, such scores rarely align with common user behavior or natural communication, as users do not typically phrase queries with explicit instructions like "Please output your confidence along with the answer."

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.24202

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > China > Shanghai > Shanghai (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.70)
Media > Television (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge

Haas, Lukas, Yona, Gal, D'Antonio, Giovanni, Goldshtein, Sasha, Das, Dipanjan

arXiv.org Artificial IntelligenceSep-10-2025

We introduce SimpleQA Verified, a 1,000-prompt benchmark for evaluating Large Language Model (LLM) short-form factuality based on OpenAI's SimpleQA. It addresses critical limitations in OpenAI's benchmark, including noisy and incorrect labels, topical biases, and question redundancy. SimpleQA Verified was created through a rigorous multi-stage filtering process involving de-duplication, topic balancing, and source reconciliation to produce a more reliable and challenging evaluation set, alongside improvements in the autorater prompt. On this new benchmark, Gemini 2.5 Pro achieves a state-of-the-art F1-score of 55.6, outperforming other frontier models, including GPT-5. This work provides the research community with a higher-fidelity tool to track genuine progress in parametric model factuality and to mitigate hallucinations. The benchmark dataset, evaluation code, and leaderboard are available at: https://www.kaggle.com/benchmarks/deepmind/simpleqa-verified.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.07968

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Colombia (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Government (0.69)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

Tan, Hexiang, Sun, Fei, Liu, Sha, Su, Du, Cao, Qi, Chen, Xin, Wang, Jingang, Cai, Xunliang, Wang, Yuanzhuo, Shen, Huawei, Cheng, Xueqi

arXiv.org Artificial IntelligenceSep-9-2025

As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness. However, existing detection methods often overlook a critical problem we term as self-consistent error, where LLMs repeatedly generate the same incorrect response across multiple stochastic samples. This work formally defines self-consistent errors and evaluates mainstream detection methods on them. Our investigation reveals two key findings: (1) Unlike inconsistent errors, whose frequency diminishes significantly as the LLM scale increases, the frequency of self-consistent errors remains stable or even increases. (2) All four types of detection methods significantly struggle to detect self-consistent errors. These findings reveal critical limitations in current detection methods and underscore the need for improvement. Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross-model probe method that fuses hidden state evidence from an external verifier LLM. Our method significantly enhances performance on self-consistent errors across three LLM families.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.17656

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

American citizen killed in Russian attack on Kyiv, State Department confirms

FOX NewsJun-17-2025, 20:55:36 GMT

A U.S. citizen died during a Russian missile attack on the Ukrainian capital of Kyiv, the State Department confirmed Tuesday afternoon. An American citizen was among the 15 killed in Russian drone and missile strikes on the Ukrainian capital city, Kyiv, on Tuesday, State Department spokesperson Tammy Bruce confirmed in a press conference Wednesday. In response to a reporter's question on U.S. diplomats in Kyiv having to spend the night in a bunker, Bruce said "we can confirm the death of a U.S. citizen in Ukraine." "We are aware of last night's attack on Kyiv that resulted in numerous casualties, including the tragic death of a U.S. citizen," she said, noting, "We condemn those strikes and extend our deepest condolences to the victims and to the families of all those affected." Bruce did not offer any more details on the identity of the citizen killed by the Russian strikes, citing "respect to the family during this obviously horrible time."

artificial intelligence, kyiv, trump, (14 more...)

FOX News

Country:

Europe > Ukraine > Kyiv Oblast > Kyiv (1.00)
Asia > Russia (1.00)
North America > Canada (0.34)
(6 more...)

Genre: Personal > Obituary (0.56)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.36)

Add feedback

Tuning LLM Judge Design Decisions for 1/1000 of the Cost

Salinas, David, Swelam, Omar, Hutter, Frank

arXiv.org Artificial IntelligenceFeb-4-2025

Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters are typically changed at the same time making apple-to-apple comparisons challenging. In this paper, we propose to systematically analyze and tune hyperparameter of LLM judges. To alleviate the high cost of evaluating a judge, we propose to leverage multi-objective multi-fidelity which allows to find judges that trades accuracy for cost and also reduce significantly the cost of the search. Our method identifies judges that not only outperform existing benchmarks in accuracy and cost-efficiency but also utilize open-weight models, ensuring greater accessibility and reproducibility.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.17178

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

5 likely choices for who really ran the disastrous Biden White House

FOX NewsJan-25-2025, 11:00:42 GMT

For years, conservative media, lawmakers and talking heads have been sounding the alarm about President Joe Biden's cognitive free fall. And for years, left-wing media, lawmakers and their loyal mouthpieces waved it off with the same condescending dismissal -- accusing us of lying, fear-mongering or worse. Some even went so far as to say they couldn't keep up with Biden's supposed brilliance and jam-packed schedule of what was mostly just one morning briefing and two mid-afternoon naps. Now that Biden has shuffled out of office, left-wing media seems to be waking up to the glaringly obvious. The New York Times of all places -- yes, the same paper that acted as Biden's PR firm -- has revealed that he relied on teleprompters during intimate fundraisers in private homes.

artificial intelligence, biden, social media, (18 more...)

FOX News

Country:

North America > United States (1.00)
Asia > Afghanistan (0.05)
Europe (0.05)
(2 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.96)

Technology:

Information Technology > Communications > Social Media (0.51)
Information Technology > Artificial Intelligence (0.49)

Add feedback

Harris' 'ice princess' demeanor, Bush's belly-tap were key expressions at Jimmy Carter's funeral: expert

FOX NewsJan-9-2025, 22:51:42 GMT

Presidents Clinton, George H.W. Bush, Obama, Biden and Trump all pay respect to Jimmy Carter at his state funeral in Washington, D.C.. During the 2024 campaign cycle, Americans witnessed what appeared to be no love lost between President-elect Donald Trump and former President Barack Obama. However, at former President Jimmy Carter's funeral the two recent presidents appeared to be enjoying each other's company and largely ignored other dignitaries arriving around them, including Vice President Kamala Harris and President Biden. Susan Constantine, a communication and body language expert, said Harris came off "as cool as could be." When she was walking she was very robotic.

artificial intelligence, obama, trump, (15 more...)

FOX News

Country:

North America > United States > District of Columbia > Washington (0.25)
North America > United States > New York (0.06)
North America > United States > Pennsylvania (0.05)

Genre: Personal (0.50)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Characteristics of Political Misinformation Over the Past Decade

Schlicht, Erik J

arXiv.org Artificial IntelligenceNov-9-2024

Although misinformation tends to spread online, it can have serious real-world consequences. In order to develop automated tools to detect and mitigate the impact of misinformation, researchers must leverage algorithms that can adapt to the modality (text, images and video), the source, and the content of the false information. However, these characteristics tend to change dynamically across time, making it challenging to develop robust algorithms to fight misinformation spread. Therefore, this paper uses natural language processing to find common characteristics of political misinformation over a twelve year period. The results show that misinformation has increased dramatically in recent years and that it has increasingly started to be shared from sources with primary information modalities of text and images (e.g., Facebook and Instagram), although video sharing sources containing misinformation are starting to increase (e.g., TikTok). Moreover, it was discovered that statements expressing misinformation contain more negative sentiment than accurate information. However, the sentiment associated with both accurate and inaccurate information has trended downward, indicating a generally more negative tone in political statements across time. Finally, recurring misinformation categories were uncovered that occur over multiple years, which may imply that people tend to share inaccurate statements around information they fear or don't understand (Science and Medicine, Crime, Religion), impacts them directly (Policy, Election Integrity, Economic) or Public Figures who are salient in their daily lives. Together, it is hoped that these insights will assist researchers in developing algorithms that are temporally invariant and capable of detecting and mitigating misinformation across time.

artificial intelligence, misinformation, social media, (12 more...)

arXiv.org Artificial Intelligence

2411.06122

Country:

North America > United States > New York (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report > New Finding (0.89)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Measuring short-form factuality in large language models

Wei, Jason, Karina, Nguyen, Chung, Hyung Won, Jiao, Yunxin Joy, Papay, Spencer, Glaese, Amelia, Schulman, John, Fedus, William

arXiv.org Artificial IntelligenceNov-6-2024

An open problem in artificial intelligence is how to train language models that produce responses that are factually correct. Current frontier models sometimes produce false outputs or answers that are not substantiated by evidence, a problem known as "hallucinations." Such hallucinations are one of the major barriers for broader adoption of general forms artificial intelligence like large language models. Factuality is a complicated topic because it is hard to measure--evaluating the factuality of any given arbitrary claim can be challenging, and language models often generate long completions that contain dozens of factual claims. In this work we will sidestep the open-endedness of language models by considering only short, fact-seeking questions with a single answer. This reduction of scope is important because it makes measuring factuality much more tractable, albeit at the cost of leaving open research questions such as whether improved behavior on short-form factuality generalizes to long-form factuality. We present a benchmark called SimpleQA, which contains 4,326 short, fact-seeking questions. SimpleQA was designed with a few important properties in mind: High correctness. Reference answers to questions are determined by two independent AI trainers, and questions were written in such a way that the predicted answers are easily gradable.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.04368

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Argentina (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback